Skip to content

[DPE-7316] Stereo mode unified charm#1630

Draft
dragomirp wants to merge 102 commits intostereo-mode-additive-codefrom
dragop/stereo-mode-unified-charm
Draft

[DPE-7316] Stereo mode unified charm#1630
dragomirp wants to merge 102 commits intostereo-mode-additive-codefrom
dragop/stereo-mode-unified-charm

Conversation

@dragomirp
Copy link
Copy Markdown
Contributor

Issue

Solution

Checklist

  • I have added or updated any relevant documentation.
  • I have cleaned any remaining cloud resources from my accounts.

marceloneppel and others added 30 commits January 27, 2026 08:54
Add a lightweight witness/voter charm that participates in Raft
consensus to provide quorum in 2-node PostgreSQL clusters without
storing any PostgreSQL data.

Key components:
- Watcher charm with Raft controller integration
- Health checking for PostgreSQL endpoints
- Relation interface (postgresql_watcher) for PostgreSQL operator
- Topology and health check actions

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
… pysyncobj Raft service

Add standalone raft_service.py that implements KVStoreTTL-compatible
Raft node managed as a systemd service, eliminating the dependency on
the charmed-postgresql snap. Remove automatic health checks in favor of
on-demand checks via action, since the watcher lacks PostgreSQL credentials.

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
…tereo mode tests

Replace cut_network_from_unit_without_ip_change with cut_network_from_unit
in stereo mode integration tests. The iptables-based approach with REJECT
was still causing timeouts; removing the interface entirely triggers faster
TCP connection failures. Added use_ip_from_inside=True for check_writes
since restored units get new IPs. Also adds spread task for stereo mode tests.

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
Add Raft member proactively during IP change to prevent race conditions
where member restarts Patroni before being added to cluster. Implement
watcher removal from Raft on relation departure to maintain correct
quorum calculations. Add idempotency check before adding watcher to Raft.
Use fresh peer IPs for Raft member addition instead of cached values.
Update stereo mode tests with iptables-based network isolation and Raft
health verification.

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
…o tests

Build the watcher charm automatically if not found and deploy charms
sequentially instead of concurrently to improve reliability.

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
- Add idempotency check to skip deployment if already in expected state
- Clean up unexpected state before redeploying to avoid test pollution
- Add wait_for_idle after replica shutdown to allow cluster stabilization

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
…fy_raft_cluster_health call

- Add use_ip_from_inside=True to test_watcher_network_isolation to handle stale IPs
- Fix verify_raft_cluster_health call in test_health_check_action to pass required arguments

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
Add __expire_keys and _onTick methods to WatcherKVStoreTTL to match
Patroni's KVStoreTTL behavior. When the watcher becomes the Raft leader
(e.g., when PostgreSQL primary is network-isolated), it must expire
stale leader keys so that a replica can acquire leadership.

Without this fix, the watcher would become Raft leader but wouldn't
process TTL expirations, causing the old Patroni leader key to remain
valid and preventing failover.

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
Juju action results require hyphenated keys (e.g., 'healthy-count')
rather than underscored keys. Fixed the health check action to use
proper key format and updated test expectations.

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
…sues

- Add watcher PostgreSQL user for health check authentication:
  - Create 'watcher' user with password via relation secret
  - Add pg_hba.conf entry for watcher IP in patroni.yml template
  - Pass password from relation secret to health checker

- Fix lint issues:
  - Extract S3 initialization to _handle_s3_initialization() to reduce
    _on_peer_relation_changed complexity from 11 to 10
  - Use absolute paths for subprocess commands (/usr/bin/systemctl, etc.)
  - Update type hints to use modern syntax (X | None vs Optional[X])
  - Fix line length formatting issues

- Fix unit test failures:
  - Add missing mocks in test_update_member_ip for endpoint methods
  - Add _units_ips mock in test_update_relation_data_leader

- Fix integration test:
  - Add check_watcher_ip parameter to verify_raft_cluster_health()
    to handle watcher IP changes after network isolation tests

- Update watcher charm to handle IP changes:
  - Add _update_unit_address_if_changed() for IP change detection
  - Call from config-changed and update-status events

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
Remove outdated constraint about deploy order being critical for
stereo mode with Raft DCS. Testing confirmed that 2 PostgreSQL
units can now be deployed simultaneously without causing split-brain.

Also update deprecated relate() calls to integrate().

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
* add new-tab-link extension and increase linkcheck timeout

Signed-off-by: andreia <andreia.velasco@canonical.com>

* replace mentions of old Juju password actions with Juju secrets

Signed-off-by: andreia <andreia.velasco@canonical.com>

* update links to 16 repo and remove mention of 14 bundle

Signed-off-by: andreia <andreia.velasco@canonical.com>

* update instructions for secrets retrieval

---------

Signed-off-by: andreia <andreia.velasco@canonical.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
* refactor home page

* fix missing refs
* add new stable releases to releases.md

* invert order (newest to oldest)

* Update release in refresh docs

* correct architecture for 990, 989

* correct arch for 952, 951

---------

Co-authored-by: Carl Csaposs <carl.csaposs@canonical.com>
Integrate the watcher charm as a mode within the main PostgreSQL charm,
following the MongoDB pattern of using a config `role` option to alternate
between "postgresql" (default) and "watcher" modes.

Key changes:
- Add `role` config option (postgresql|watcher), immutable after deploy
- Rename provides relation `watcher` to `watcher-offer` for PostgreSQL mode
- Add requires relation `watcher` for watcher mode
- Branch charm __init__ based on role: watcher mode skips snap install,
  Patroni, backups, TLS, etc. and only runs Raft + health checker
- Move watcher source files (raft_controller, raft_service, watcher_health)
  into main src/
- Create WatcherRequirerHandler for watcher-mode event handling
- Persist role in peer databag and block on role change attempts
- Update integration tests for unified charm deployment

Deploy example:
  juju deploy postgresql pg
  juju deploy postgresql pg-watcher --config role=watcher
  juju relate pg:watcher-offer pg-watcher:watcher

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
The @trace_charm decorator expects tracing_endpoint attribute to exist
after __init__. In watcher mode we return early, so set it to None.

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
* Limit bucket listing to find the timelines

* Add ceph pitr test

* Switch back to recurse

* Refactor tests

* Fix imports

* Fix tests

* Reduce boto logs

* Typo
* Cleanup config code

* Merge update sync config in the bulk patch call

* Add storage-hot-standby-feedback and durability-maximum-lag-on-failover

* Fix default

* Remove extra patch

* Update to spec
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
* Move TLS transfer to single kernel

* Switch to released lib
* add instructions for custom usernames to integration guide

* Update docs/how-to/integrate-with-another-application.md

Co-authored-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
Signed-off-by: Andreia <andreia.velasco@canonical.com>

---------

Signed-off-by: Andreia <andreia.velasco@canonical.com>
Co-authored-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
…ailable) (#1318)

* DPE-8980 Support Juju 4: us 'ip' databag field (overwrites 'private-address')

The Juju 4 has removed support databag fiesl `private-address`, `ingress-address` and more.
The field we should use is `ip` now. The PG16 charm still have to support Juju 3.6 LTS,
so adding support of the ip field with backward compatibility.

Users can deploy it on Juju 4 using:
> juju deploy postgresql --channel 16/edge --force

* Address comments in PR
@dragomirp dragomirp added the enhancement New feature, UI change, or workload upgrade label Apr 18, 2026
@github-actions github-actions Bot added the Libraries: OK The charm libs used are OK and in-sync label Apr 18, 2026
content = secret.get_content(refresh=True)
return content.get("raft-password")
except SecretNotFoundError:
logger.warning(f"Secret {secret_id} not found")
Comment thread src/relations/watcher_requirer.py Fixed
@dragomirp dragomirp force-pushed the dragop/stereo-mode-unified-charm branch from 4a5a1af to 56bc4ce Compare April 18, 2026 15:17
@dragomirp dragomirp force-pushed the dragop/stereo-mode-unified-charm branch from 0abd2aa to 439e345 Compare April 20, 2026 19:19
@dragomirp dragomirp force-pushed the dragop/stereo-mode-unified-charm branch from 1678751 to 80d1906 Compare April 20, 2026 20:10
@dragomirp dragomirp force-pushed the dragop/stereo-mode-unified-charm branch from 5ab2b49 to b4f5c14 Compare April 22, 2026 12:43
@dragomirp dragomirp changed the title [MISC] Dragop/stereo mode unified charm [DPE-7316] Stereo mode unified charm Apr 23, 2026
@dragomirp dragomirp force-pushed the dragop/stereo-mode-unified-charm branch from 879bede to 9aaafc7 Compare April 23, 2026 23:47
content = secret.get_content(refresh=True)
return content.get("watcher-password")
except SecretNotFoundError:
logger.warning(f"Secret {secret_id} not found")
@dragomirp dragomirp changed the base branch from 16/edge to stereo-mode-additive-code April 24, 2026 16:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature, UI change, or workload upgrade Libraries: OK The charm libs used are OK and in-sync

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants